Support multiple Fireworks deployments independently#514
Support multiple Fireworks deployments independently#514
Conversation
Admit from the waiting room if any deployment is healthy (was worst-of across all). With one deployment per model — and per country in the future — a degraded deployment for one model shouldn't block users whose model routes elsewhere. Also make the DEPLOYMENT_SCALING_UP cooldown per-deployment; one deployment's 503 no longer poisons routing for the others. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Greptile SummaryThis PR refactors Fireworks deployment health checking and cooldown tracking to be per-deployment rather than global, so that a degraded or scaling-up deployment for one model doesn't block routing to other models. Key changes:
Confidence Score: 5/5Safe to merge — changes are well-scoped, backwards-compatible, and covered by new and updated tests. The logic change in No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Waiting Room: getFireworksHealth] --> B[probe: fetch Prometheus metrics]
B --> C{deploymentIds empty?}
C -- yes --> D[return 'healthy']
C -- no --> E[classify samples, deploymentIds]
E --> F{for each deploymentId\nclassifyOne}
F --> G{any 'healthy'?}
G -- yes --> D
G -- no --> H{any 'degraded'?}
H -- yes --> I[return 'degraded'\ndo NOT admit]
H -- no --> J[return 'unhealthy'\ndo NOT admit]
subgraph classifyOne
K[KV blocks >= 0.98?] -- yes --> L[unhealthy]
K -- no --> M[5xx rate >= 10%?]
M -- yes --> L
M -- no --> N[prefill p90 > 1000ms?]
N -- yes --> O[degraded]
N -- no --> P[KV blocks >= 0.80?]
P -- yes --> O
P -- no --> Q[healthy]
end
subgraph createFireworksRequestWithFallback
R[isDeploymentCoolingDown deploymentId] -- cooling down --> S[standard Fireworks API]
R -- not cooling down --> T[custom deployment request]
T -- 503 DEPLOYMENT_SCALING_UP --> U[markDeploymentScalingUp deploymentId\ncooldown per-deployment Map]
U --> S
T -- other 5xx --> S
T -- success --> V[return response]
end
Reviews (1): Last reviewed commit: "Support multiple Fireworks deployments i..." | Re-trigger Greptile |
Summary
DEPLOYMENT_SCALING_UPcooldown is now per-deployment (keyed by deployment path), so one deployment's 503 no longer poisons routing for the others.:sum_by_deployment/:avg_by_deploymentmetric suffixes.Test plan
fireworks-health.test.ts— any-healthy, all-degraded, all-unhealthy casesfireworks-deployment.test.ts— per-deployment cooldown isolation + existing fallback cases (21 tests)tsc --noEmitclean🤖 Generated with Claude Code